Academic Torrents: Scalable Data Distribution

نویسندگان

  • Henry Z. Lo
  • Joseph Paul Cohen
چکیده

As competitions get more popular, transferring ever-larger data sets becomes infeasible and costly. For example, downloading the 157.3 GB 2012 ImageNet data set incurs about $4.33 in bandwidth costs per download. Downloading the full ImageNet data set takes 33 days. ImageNet has since become popular beyond the competition, and many papers and models now revolve around this data set. For sharing such an important resource to the machine learning community, the sharers of ImageNet must shoulder a large bandwidth burden. Academic Torrents reduces this burden for disseminating competition data, and also increases download speeds for end users . By augmenting an existing HTTP server with a peer-to-peer swarm, requests get re-routed to get data from downloaders. While existing systems slow down with more users, the benefits of Academic Torrents grow, with noticeable effects even when only one other person is downloading.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dynamic swarm management for improved BitTorrent performance

BitTorrent is a very scalable file sharing protocol that utilizes the upload bandwidth of peers to offload the original content source. With BitTorrent, each file is split into many small pieces, each of which may be downloaded from different peers. While BitTorrent allows peers to effectively share pieces in systems with sufficient participating peers, the performance can degrade if participat...

متن کامل

A Torrent Recommender based on DHT Crawling

The DHT Mainline is a significant extension to the BitTorrent protocol. The DHT Mainline has several million users and is the largest DHT network. This thesis uses the DHT Mainline to generate a recommendation system for torrents. A program was written crawling the entirety of the torrent search engine kickass.to gathering metadata about torrents. The DHT Mainline was then crawled to search for...

متن کامل

The Pirate Bay Torrent Analysis and Visualization

Using C# as a parser, we process about 3.4 million pieces of data over 680 thousand torrents from thepiratebay.org, and create a graphical representation of the data by infographic. Info-graphic presents the information in an easily readable format, and also can be distributed across many webmediums. Based on the representation/analysis of the data, we are able to determine some interesting cha...

متن کامل

Access control in ultra-large-scale systems using a data-centric middleware

  The primary characteristic of an Ultra-Large-Scale (ULS) system is ultra-large size on any related dimension. A ULS system is generally considered as a system-of-systems with heterogeneous nodes and autonomous domains. As the size of a system-of-systems grows, and interoperability demand between sub-systems is increased, achieving more scalable and dynamic access control system becomes an im...

متن کامل

Angling for Big Fish in BitTorrent

BitTorrent piracy is at the core of fierce debates around network neutrality. Most of the legal actions against BitTorrent exchanges are targeted toward torrent indexing sites and trackers. Surprisingly, little is known about the initial seeds that insert contents on BitTorrent and about the highly active peers that are present in a large number of torrents. The main reason is that acquiring th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1603.04395  شماره 

صفحات  -

تاریخ انتشار 2016